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ABSTRACT : 

CHG DATE=19990617 STATUS=0> In a system for recognizing a time 
sequence of 

feature vectors of a speech signal representative of an unknovm 
utterance as 

one of a plurality of reference patterns, a generator (11) for 
generating the 
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reference patterns has a converter (15) for converting a plurality of 

time 

sequences of feature vectors of an input pattern of a speech signal 
with 

variances to a plurality of time sequences of feature codes with 

reference to 

code vectors (14) which are previously prepared by the known 
clustering. A 

first pattern former (16) generates a state transition probability 
distribution 

and an occurrence probability distribution of feature codes for each 
state in a 

state transition network. A function generator (17) calculates 

parameters of 

continuous Gaussian density function from the code vectors and the 
occurrence 

probability distribution to produce the continuous Gaussian density 
function 

approximating the occurrence probability distribution. A second 
pattern former 

(18) produces a reference pattern defined by the state transition 
probability 

distribution and the continuous Gaussian density function . For a 

plurality of 

different training words, a plurality of reference patterns are 
generated and 

are memorized in the reference pattern generator. 
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0 Device for generating a reference pattern with a continuous probability density function derived 
from feature code occurrence probability distribution. 



@ In a system for recognizing a time sequence of feature vectors of a speech signal representative of an 
unknown utterance as one of a plurality of reference pattems, a generator (11) for generating the reference 
patterns has a converter (15) for converting a plurality of time sequences of feature vectors of an input pattern of 
a speech signal with variances to a plurality of time sequences of feature codes with reference to code vectors 
(14) which are previously prepared by the known clustering. A first pattern former (16) generates a state 
transition probability distribution and an occurrence probability distribution of feature codes for each state in a 

a state transition network. A function generator (17) calculates parameters of continuous Gaussian density function 
from the code vectors and the occurrence probability distribution to produce the continuous Gaussian density 
^function approximating the occurrence probability distribution, A second pattern former (18) produces a 
<D reference pattern defined by the state transition probability distribution and the continuous Gaussian density 
©function. For a plurality of different training words, a plurality of reference pattems are generated and are 
QQ memorized In the reference pattern generator. 
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DEVICE FOR QENERATINQ A REFERENCE PATTERN WITH A CONTINUOUS PROBABILITY DENSITY 
FUNCTION DERIVED FROM FEATURE CODE OCCURRENCE PROBABILITY DISTRIBUTION 

This invention relates to a speech recognition system and, In particular, to a device for producing a 
reference pattern for use In the system. 

in speech recognition systems, a speech signal having a pattern Is analyzed by a feature analyzer to 
produce a time sequence of feature vectors. The time sequence of feature vectors are compared with 
5 reference patterns and are thereby identified as one of the reference patterns. 

Considering variation of the pattem of the speech signal due to a plurality of utterances, the reference 
pattern are generated from a number of training speeches. 

One of known speech recognition systems has a table memorizing a plurality of code vectors and a 
plurality of feature codes conresponding thereto for vector quantizing the time sequence of feature vectors. 
70 For example, such a speech recognition system using the table Is described in an article contributed by 
S.E. Levinson, LR. Rabiner. and M. M. Sondhi to the Bell System Technical Journal. Volume 62, No. 4 
(April 1983). pages 1035 to 1074, under the title of "An Introduction to the Application of tiie Theory of 
Probabilistic Functions of a Markov Process to Automatic Speech Recognition". 

According to tiie Levinson et al article, tiie speech recognition system comprises tiie code vector table 
16 for memorizing a plurality of code vectors and a plurality of feature codes corresponding thereto. 

On generating tiie reference pattem, a plurality of speech signals are used which are produced by a 
plurality of utterances and are representative of the predetermined input pattem with variations. Connected 
to tiie feature analyzer and to tiie code vector table, a converter is used in converting tiie plurality of feature 
vector time sequences Into a plurality of time sequences of feature codes, respectively, witii reference to 
20 tfie code vectors. A forming circuit is connected to tiie converter and has a state transition network or table. 

The state transition network has a plurality of states which vary from one to anotiier witii a state 
transition probability In accordance with time elapsing. Therefore, for the feature code time sequences, the 
feature codes appear in each state in tiie state transition network. When attention Is directed to a particular 
code among tiie feature codes, tiie particular code has a probability of occurrence In 'each state in the 
25 transition network. 

The forming circuit is responsive to the feature code time sequences and calculates the state transition 
probability distribution and tiie occurrence probability disti-ibution of tiie feature codes for each state to 
generate a reference pattem comprising the botii probability distributions. 

in tiie Levinson et al speech recognition system, the reference pattem is generated In tiiis manner in 
30 response to each predetennined input pattern by a reference pattem generating device which comprises 
tiie code vector table, the converter, and the forming circuit. The reference pattern generating device is 
rapidly operable because the reference pattem can be obtained with a relatively little calculation processing. 
The reference pattern is. however, liable to cause erroneous speech recognition because of quantizing 
enror. 

35 Another speech recognition system is disclosed in United States Patent No. 4,783,804 issued to Bllng- 
Hwan Juang et al. According to Juang et al patent, a reference pattern generating device comprises a 
speech analyzer and a function generator. The speech analyzer produces a plurality of feature vector time 
sequences representative of a predetermined input pattem of a plurality of varieties. A function generator is 
coupled to the speech analyzer and calculates. In response to the feature vector time sequences, a state 

40 transition probability distiibution in tiie state transition networic and a probability density function which it is 
possible to understand to approximate a probability distribution of occunrence of tiie feature vectors for each 
state. The function generator generates a reference pattem in response to the state transition probability 
distribution and the probability density hjnction. 

The Juang et al reference pattem generating device can generate tiie reference pattern which enables 

46 speech recognition with a reduced enror because no vector quantization is used. The device is. however, 
incapable of rapidly generating the reference pattem because the processing Is increased for calculating the 
reference pattern. 



60, Summary of the Invention: 

It is an object of the present invention to provide a reference pattem generating device which is capable 
of rapidly generating the reference pattern which enables speech recognition with a reduced error. 

It is another object of the present invention to provide a speech recognition system which is capable of 

2 
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rapidly recognizing speech witii little enror. 

As described above, a reference pattern generating device includes a feature analyzer responsive to a 
speech signal representative of an input pattern for producing a time sequence of feature vectors 
representative of the input pattern; a table for memorizing a plurality of code vectors and a plurality of 
feature codes corresponding thereto; converting means connected to the feature analyzer and the table for 
converting a plurality of time sequence of feature vectors to a plurality of time sequence of feature codes 
with reference to the table, a plurality of the time sequences of the feature vectors being produced in 
response to a plurality of speech signals including the first-mentioned speech signal; and first fomning 
means for fonning, in response to a plurality of the time sequence of the feature codes, a state transition 
probability In a state transition network and a probability density distribution of occun-ence of the feature 
codes in each state in the state transition networic. According to the present invention, the reference pattern 
generating device comprises: function generating means connected to the table and the first forming means 
for generating a probability density function approximating the probability distribution with the code vectors 
used as parameters in the function; and second forming means connected to the first forming means and 
the function generating means for forming a reference pattern for a plurality of the speech signals, the 
reference pattern being defined by the state transition probability distribution and the probability density 
function. 

According to an aspect of the present Invention, the function generating means generates as the 
probability density function a Gaussian probability density function which is expressed by: 



where u and are a mean value and a covariance, respectively, the function generating means calculating 
the mean value and the covariance in accordance with the following equations: 



where R| is the code vectors, bpi being the feature code occun'ence probabilities. I being a number of the 
code vectors. 

In a speech recognition system of the reference pattern generating device, a feature vector time 
sequence representative of an unknown speech signal is directly compared with the reference patterns 
without being converted into a feature code time sequence so as to recognize the speech signal as one of 
the reference pattem. 



Brief Description of the drawings: 

Rg. 1 is a block diagram view of a speech recognition system according to an embodiment of the 
present invention; and 

Rg. 2 Is a block diagram view of an identifier ih the speech recognition system of Rg. 1. 



Description of the Prefenred Embodiment 

Referring to Rg. 1, a speech* recognition system shown therein comprises a feature analyzer 10 for 
analyzing an input pattem of a speech signal to produce a time sequence of feature vectors representative 
of the input pattern, a reference pattern generator 11 for generating and memorizing patterns of training 
speeches as reference patterns, an identifier 12 for comparing a time sequence of feature vectors of a 
speech signal of an unknown utterance with the reference patterns to Identify the utterance, and a mode 
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selection switch 13 for selectively connecting the feature analyzer 10 to the reference pattern generator 11 
and the identifier 12. 

The feature analyzer 10 analyzes an Input pattern of an Incoming speech signal S due to an utterance 
by a known analyzing method, such as, the melcepstrum or the linear prediction coding and produces a 
5 time sequence of feature vectors V. The time sequence of feature vectors V is represented by: 

V a {Vl.V2,V3....Vt,,..,VT}, 

where Vt represents a feature vector at a time instant t and T represents an entire time duration of the 
incoming speech signal. Each of feature vectors Vt Is an N-order vector and is represented by; 

V, ={Vt,.V,2.Vt3» ...Vtn V,m}. 

10 The mode selection switch 13 is switched to tfie reference pattem generator 11 during a training mode. 
Accordingly, tiie time sequence of feature vectors V Is applied to tiie reference pattern generator 11 from 
the feature analyzer 10 through the mode selection switch 13. The time sequence of feature vectors V 
represents an input pattem of a training speech. 

The reference pattern generator 11 comprises a code vector table 14 for memorizing a plurality of code 

re vectors and a plurality of feature codes corresponding thereto, a converting circuit 15 for converting tiie 
time sequence of feature vectors V into a time sequence of feature codes with reference to tiie code vector 
table 14. a first pattem fonming circuit 16 responsive to a plurality of time sequences of feature codes for 
fonning a first pattern comprising a state transition probability distribution and a probability distribution of 
occurrence of the feature codes for each state in a state transition network, a function generator 17 for 

20 generating a probability density function from the probability distribution of occurrence of tiie feature codes 
witii reference to tiie code vector table 14, and a second pattern forming circuit 18 for forming a second, 
pattem which comprises the state transition probability distribution and tiie feature code occun^ence 
probability density function and holding the second pattem as tiie reference pattern. 

The code vector table 14 memorizes a plurality of code vectors R (= {Ri, R2. Ro Ri. .... Ri}. where I 

25 is a number of code vectors). Each of code vectors R| is represented by; 
Ri = {rill ri2, ri3, ....rjn, rw}. 

Each of tiiese code vectors R Is previously prepared from iterative utterance of a different known 
vocabulary by the known clustering. Then, a feature code is determined for each of the code vectors R. 
The code vector table 14 also memorizes a plurality of feature codes conresponding to the code 

30 vectors, respectively. 

The converting circuit 15 receives tiie time sequence of feature vectors V from the feature analyzer 10 
and detects likelihood of the time sequence of feature vectors V and tiie code vectors R. The detection of 
likelihood is effected by use of one of known likelihood detecting metfiod. In the present embodiment, a 
metiiod is used where tiie square distance D Is detected between each of the feature vectors Vt and each 
OS of code vector R| as follows: 

40 

Then, an optimum code vector Rj is detected as a specific code vector which makes the square 
distance D minimum, and a specific one of the feature codes Ci is obtained in con^espondence to the 
optimum code vector R|. Thus, the feature vector Vt is converted into tiie specific feature code C|. Similar 
conversion Is effected for all of feature vectors V and a time sequence of feature codes C is obtained for 
tiie time sequence of feature vectors V. The time sequence of feature codes C is represented by: 
C = {Cii, Ci2, Ci3, .... Crr}. 

The time sequence of feature codes C Is applied to tiie first pattern forming circuit 16. 

Similar process is repeated by a predetermined time number for iterative utterance of the same known 
^ ti'aining vocabulary. When tiie utterance is repeated K times, K time sequences of feature codes are 
obtained. The K times sequences of feature codes are represented by Ci, C2. C3. Ck, respectively, and 
are collectively represented by Ck (l^k^K). 

The first pattern fomilng circuit 16 has tiie state transition network or table. The first pattern forming 
circuit 16 receives the K time sequences of feature codes and carries out exti-apolation of an optimum 
state transition probability distribution A and a probability distribution B of occurrence of the feature codes 
for each state in the state transition network from Ck by tiie Baum-Welch algorithm. 

The state transition probability distribution A and tiie feature code occurrence probability distribution 8 
are represented by: 

4 



9/17/06, EAST Version: 2.1.0.14 



EP 0 328 064 A2 



A a {A1.A2.A3 Ap. .... Ap} and 

B = {81,82,83. ....Bp, ....Bp} ^ 

respectively. P is a number of states. Assuming that Ai, Aa. A3, .... Ap and Ap are collectively 

represented by Ap and 81, 82. B3 Bp, .... and Bp are collectively represented by Bp (^SpSP), Ap and Bp 

5 are given by: 

Ap « {apt. ap2. ap3, .... apo} and 

Bp - {bpi, bp2, bp3. .... bpj}, * ^, , 

respectively. Q Is a number of stales to which transition is possible from the state p. Accordingly, apq - 
(15qiQ) represents a transition probability from the state p to g states. While, bpt (1^51) represents an 
10 occurrence probability of the feature code R| In the state p. 

Thus, a first pattern Is formed which comprises the state transition probability distribution A and the 
feature code occurrence probability distribution B. 

The stale transition probability distribution A is applied to the second pattem fonning circuit 18 from the 
first pattern forming circuit 16 while the feature code occurrence probability distribution 8 Is applied to the 
75 function generator 17. 

The function generator 17 produces an approximate continuous probability density function from the 
feature code occurrence probability distribution 8 with reference to code vectors R In the code vector table 
14. 

The Gaussian probability density function and the Poisson probability density function can be used as 
20 the approximate probability density function. 

In the present embodiment, the Gaussian probability density function is used. The Gaussian probability 
density function is represented by: 



25 



f(x) . (l/^^)e-(''-/'>'/2a^\ 



Parameters u and are a mean value and a covarlance. respectively. In the embodiment, the mean value 
and the covarlance are ones of the code vectors R. Therefore, those parameters u and are obtained by 
30 the following equations: 

I- 



35 



/JL « IlRi-b^., and 



Ri is read from the code vector table 14 and bpi Is given by the feature code occunrence probability 
distribution B. 

Thus, the function generator 17 produces the approximate continuous probability density function Be 
which is applied to the second pattern forming circuit 18. 

The second pattem forming circuit 18 receives the state transition probability distribution A from the 
first pattern forming circuit 16 and the approximate continuous probability density function Be from the 
function generator 17 and combines them to fonn a second pattern. The second pattem forming circuit 18 
memorizes the second pattem as the reference pattem P. 

Reference patterns are generated and memorized for different training speeches in the similar manner 
as described above. 

In the recognition mode, the mode selection switch 13 Is switched to the identifier 12. 

The feature analyzer 10 receives the speech signal S due to an unknown utterance and produces the 
time sequence of feature vectors V as Vs. The time sequence of feature vectors Vs Is applied to the 
identifier 12 through the mode selection switch 13. 

Referring to Rg. 2. the identifier 12 comprises a probability generator 21 and a selector 22. 

The probability generator 21 is connected to the second pattem forming circuit 18 and the feature 
analyzer 10. The probability generator 21 generates occunrence probabilities P(V1P> of the time sequence of 
feature vectors Vs for all of the reference pattems P. Each of the probability P(V|P) can be calculated by 
use of the Viterbi algorithm with the dynamic programming technique or the Fon^vard-Backward algorithm. 

5 
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Claims 

1. A reference pattern generating device including: 
a feature analyzer responsive to a speech signal representative of an input pattern for producing a time 

5 sequence of feature vectors representative of said Input pattern: a table for memorizing a plurality of code 
vectors and a plurality of feature codes conresponding thereto; converting means connected to said feature 
analyzer and said table for converting a plurality of time sequence of feature vectors to a plurality of time 
sequence of feature codes with reference to said table, said plurality of time sequences of the feature 
vectors being produced in response to a plurality of speech signals Including the first-mentioned speech 

10 signal: and first forming means for forming, in response to said plurality of time sequence of the feature 
codes, a state transition probability in a state transition networic and a probability density distribution of 
occurence of the feature codes in each state In said state transition network; 
wherein the Improvement comprises: 

function generating means connected to said table and said first forming means for generating a probability 
75 density function approximating said probability distribution with said code vectors used as parameters in 
said function: and 

second fomilng means connected to said first forming means and said function generating means for 
forming a reference pattern for said plurality of speech signals, said reference pattern being defined by said 
state transition probability distribution and said probability density function. 
20 2. A device as claimed in Claim 1. said function generating means generates as the probability density 
function a Gaussian probability density function which is expressed by: 
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f{x) = (l/^;^)e-^^-/">^/2(^2^ 



where u and are a mean value and a covariance, respectively, said function generating means 
calculating the mean value and the covariance in accordance with the following equations: 



30 I 

I 

i«l 
I 



/U » C^Ri»'»pi' and 



where R| is said code vectors, bpi being the feature code occurrence probabilities. I being a number of said 
code vectors. 

3. A speech recognition system for recognizing a speech, which comprises: 
a feature analyzer responsive to a speech signal representative of an input pattern for producing a time 
sequence of feature vectors representative of said input pattern: 
mode selection switch means for selecting one of a training mode and a recognition mode; 
reference pattern generating means being coupled with said feature analyzer through said mode selection 
switch means selecting said training mode and for generating and memorizing a plurality of reference 
pattems: 

said reference pattern generating means comprises; 

a table for memorizing a plurality of code vectors and a plurality of feature codes corresponding thereto; 
converting means connected to said feature analyzer and said table for converting a plurality of time 
sequences of feature vectors to a plurality of time sequences of feature codes with reference to said table, 
said plurality of time sequences of the feature vectors being produced in response to a plurality of speech 
signals including the first-mentioned speech signal; 

first fonmlng means for forming, in response to said plurality of time sequence of the feature codes, a first 
pattern comprising a state transition probability in a state transition networic and a probability density 
distribution of occurrence of the feature codes in each state in said state transition networi<; 
function generating means connected to said table and said first forming means for generating a probability 
density function approximating said probability distribution with said code vectors used as parameters in 
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said function; and 

second forming means connected to said first forming means and said function generating means for 
forming and memorizing a second pattern for said plurality of speech signals, said second pattern being 
defined by said state transition probability and said probability density function, said second fomiing means 

5 memorizing said second pattern as one of said reference patterns; and 

identifying means connected said second forming means and connected to said a feature analyzer through 
said mode selection switch means when recognizing an unknown speech signal for identifying, in response 
to an identifying time sequence of feature vectors representative of said unknown speech signal as the time 
sequence of feature vectors from said feature analyzer, said Identifying time sequence of feature vectors as 

10 one of said reference patterns in said second forming means. 

4. A system as claimed in Claim 3, wherein said identifying means comprises: 
generating means coupled with said second forming means and responsive to said identifying time 
sequence of feature vectors for generating an occunrence probability of said identifying time sequence of 
feature vectors for each of tine reference patterns; and 

75 selecting means coupled witii said generating means for selecting a specific one of tiie reference patterns 
which makes the occun-ence probability maximum to produce said specific reference pattern as an 
identifying output 
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